Now that you've mastered the core decision of selecting the next edge in Prim's algorithm, let's explore the data structures that allow a computer to make that same decision efficiently. To translate the algorithm's logic into code, we need three key components.
- 1. Graph Representation: We need a way to store the graph $G = (V, E)$. An Adjacency List is ideal, especially for sparse graphs, as it efficiently stores neighbors for each vertex. This is typically more memory-efficient than an Adjacency Matrix.
- 2. Visited Set: To avoid cycles and track progress, we must maintain a set of vertices already included in the MST. A hash set provides $O(1)$ average time complexity for additions and lookups, making it perfect for checking if a vertex is already in our tree.
- 3. Priority Queue: The heart of an efficient implementation is finding the minimum-weight frontier edge. A Min-Heap (Priority Queue) is the optimal data structure for this task. It allows us to add all frontier edges and retrieve the one with the smallest weight in $O(\log V)$ time, leading to an overall complexity of $O(E \log V)$.
Python: Data Structure Initialization
1import heapq
2
3# 1. Graph Representation (Adjacency List)
4# Using a portion of our Shared_Graph example
5graph = {
6 'A': [('B', 4), ('C', 3)],
7 'B': [('A', 4), ('C', 1), ('D', 5)],
8 'C': [('A', 3), ('B', 1), ('D', 2)],
9 'D': [('B', 5), ('C', 2)]
10}
11
12# Initial setup starting from vertex 'A'
13start_node = 'A'
14
15# 2. Visited Set
16visited = {start_node}
17
18# 3. Priority Queue (Min-Heap) for frontier edges
19# Stores (weight, from_vertex, to_vertex)
20pq = []
21for neighbor, weight in graph[start_node]:
22 heapq.heappush(pq, (weight, start_node, neighbor))
23
24print(f"Visited Set: {visited}")
25print(f"Priority Queue (Heap): {pq}")